Cross-Lingual Sentiment Analysis for Indian Languages using Linked WordNets
نویسندگان
چکیده
Cross-Lingual Sentiment Analysis (CLSA) is the task of predicting the polarity of the opinion expressed in a text in a language Ltest using a classifier trained on the corpus of another language Lt rain. Popular approaches use Machine Translation (MT) to convert the test document in Ltest to Lt rain and use the classifier of Lt rain. However, MT systems do not exist for most pairs of languages and even if they do, their translation accuracy is low. So we present an alternative approach to CLSA using WordNet senses as features for supervised sentiment classification. A document in Ltest is tested for polarity through a classifier trained on sense marked and polarity labeled corpora of Lt rain. The crux of the idea is to use the linked WordNets of two languages to bridge the language gap. We report our results on two widely spoken Indian languages, Hindi (450 million speakers) and Marathi (72 million speakers), which do not have an MT system between them. The sense-based approach gives a CLSA accuracy of 72% and 84% for Hindi and Marathi sentiment classification respectively. This is an improvement of 14%-15% over an approach that uses a bilingual dictionary.
منابع مشابه
IndoWordNet
India is a multilingual country where machine translation and cross lingual search are highly relevant problems. These problems require large resourceslike wordnets and lexiconsof high quality and coverage. Wordnets are lexical structures composed of synsets and semantic relations. Synsets are sets of synonyms. They are linked by semantic relations like hypernymy (is-a), meronymy (part-of), tro...
متن کاملCross-Lingual Sentiment Analysis Without (Good) Translation
Current approaches to cross-lingual sentiment analysis try to leverage the wealth of labeled English data using bilingual lexicons, bilingual vector space embeddings, or machine translation systems. Here we show that it is possible to use a single linear transformation, with as few as 2000 word pairs, to capture fine-grained sentiment relationships between words in a cross-lingual setting. We a...
متن کاملAspect-Level Cross-lingual Sentiment Classification with Constrained SMT
Most cross-lingual sentiment classification (CLSC) research so far has been performed at sentence or document level. Aspect-level CLSC, which is more appropriate for many applications, presents the additional difficulty that we consider subsentential opinionated units which have to be mapped across languages. In this paper, we extend the possible cross-lingual sentiment analysis settings to asp...
متن کاملCross Lingual Sentiment Analysis using Modified BRAE
Cross-Lingual Learning provides a mechanism to adapt NLP tools available for label rich languages to achieve similar tasks for label-scarce languages. An efficient cross-lingual tool significantly reduces the cost and effort required to manually annotate data. In this paper, we use the Recursive Autoencoder architecture to develop a Cross Lingual Sentiment Analysis (CLSA) tool using sentence al...
متن کاملComplex Predicates in Indian Language Wordnets
Wordnets, which are repositories of lexical semantic knowledge containing semantically linked synsets and lexically linked words, are indispensable for work on computational linguistics and natural language processing. While building wordnets for Hindi and Marathi, two major IndoEuropean languages, we observed that the verb hierarchy in the Princeton Wordnet was rather shallow. We set to constr...
متن کامل